Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor

نویسندگان

  • Yang Yu
  • Hong An
  • Junshi Chen
  • Weihao Liang
  • Qingqing Xu
  • Yong Chen
چکیده

The increasing gap between plentiful computing elements and limited memory bandwidth makes it increasingly difficult and sometimes even infeasible for HPC community to port more applications onto many-core processor archi‐ tectures. The Sunway many-core processor SW26010 used to build the Sunway TaihuLight System contains a total of 260 heterogeneous cores. All these cores can be divided into 4 core groups (CGs). Each CG includes a Management Processing Element (MPE) core and 64 Computing Processing Elements (CPEs) cores. In this paper, we refactor an important molecular dynamics (MD) appli‐ cation GROMACS on the Sunway Taihulight system. By rewriting the computeintensive kernel of GROMACS, we exploit a suitable parallelism for CPE cluster and implement pipelining computation between MPE and CPE cluster. Optimi‐ zation strategies including the efficient use of scratchpad, the software-emulated cache and a hybrid parallel algorithm are adopted to solve the challenging memory bandwidth limitation. When comparing the refactored version using MPE and 64 CPEs with the original ported version using only MPE, we achieve a 16x speedup for the compute-intensive kernel. For simulating a molecule with 3 million atoms, we currently have managed to scale to 798,720 cores. Moreover, we analyze the adaptability of our mapping and optimization strategies for solving the memory bandwidth limitation when refactoring a real-world application on the Sunway heterogeneous many-core processor system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Scalability study of molecular dynamics simulation on Godson-T many-core architecture

Molecular dynamics (MD) simulation has broad applications, and an increasing amount of computing power is needed to satisfy the large scale of the real world simulation. The advent of the many-core paradigm brings unprecedented computing power, but it remains a great challenge to harvest the computing power due to MD’s irregular memory-access pattern. To address this challenge, this paper prese...

متن کامل

A Multi-Core Pipelined Architecture for Parallel Computing

Parallel programming on multi-core processors has become the industry’s biggest software challenge. This paper proposes a novel parallel architecture for executing sequential programs using multi-core pipelining based on program slicing by a new memory/cache dynamic management technology. The new architecture is very suitable for processing large geospatial data in parallel without parallel pro...

متن کامل

A Performance/Energy Analysis and Optimization of Multi-Core Architectures with Voltage Scaling Techniques

In this paper, we develop asymptotic analysis and simulation models to better understand the characteristics of performance and energy consumption in a multi-core processor design in which dynamic voltage scaling is used. Our asymptotic model is derived using Amdahl’s law, Rent’s rule and power equations to derive the optimum number of cores and their voltage levels. Our model can predict the p...

متن کامل

Combining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems

In this paper, we combine coarse-grained software pipelining with DVS (Dynamic Voltage/Frequency Scaling) for optimizing energy consumption of streambased multimedia applications on multi-core embedded systems. By exploiting the potential of multi-core architecture and the characteristic of streaming applications, we propose a two-phase approach to solve the energy minimization problem for peri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017